NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Omnivisor: A Real-Time Static Partitioning Hypervisor Extension for Heterogeneous Core Virtualization over MPSoCs

https://doi.org/10.4230/LIPIcs.ECRTS.2024.7

Ottaviano, Daniele; Ciraolo, Francesco; Mancuso, Renato; Cinque, Marcello (July 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Following the needs of industrial applications, virtualization has emerged as one of the most effective approaches for the consolidation of mixed-criticality systems while meeting tight constraints in terms of space, weight, power, and cost (SWaP-C). In embedded platforms with homogeneous processors, a wealth of works have proposed designs and techniques to enforce spatio-temporal isolation by leveraging well-understood virtualization support. Unfortunately, achieving the same goal on heterogeneous MultiProcessor Systems-on-Chip (MPSoCs) has been largely overlooked. Modern hypervisors are designed to operate exclusively on main cores, with little or no consideration given to other co-processors within the system, such as small microcontroller-level CPUs or soft-cores deployed on programmable logic (FPGA). Typically, hypervisors consider co-processors as I/O devices allocated to virtual machines that run on primary cores, yielding full control and responsibility over them. Nevertheless, inadequate management of these resources can lead to spatio-temporal isolation issues within the system. In this paper, we propose the Omnivisor model as a paradigm for the holistic management of heterogeneous platforms. The model generalizes the features of real-time static partitioning hypervisors to enable the execution of virtual machines on processors with different Instruction Set Architectures (ISAs) within the same MPSoC. Moreover, the Omnivisor ensures temporal and spatial isolation between virtual machines by integrating and leveraging a variety of hardware and software protection mechanisms. The presented approach not only expands the scope of virtualization in MPSoCs but also enhances the overall system reliability and real-time performance for mixed-criticality applications. A full open-source reference implementation of the Omnivisor based on the Jailhouse hypervisor is provided, targeting ARM real-time processing units and RISC-V soft-cores on FPGA. Experimental results on real hardware show the benefits of the solution, including enabling the seamless launch of virtual machines on different ISAs and extending spatial/temporal isolation to heterogenous cores with enhanced regulation policies.
more » « less
Full Text Available
Shared Resource Contention in MCUs: A Reality Check and the Quest for Timeliness

https://doi.org/10.4230/LIPIcs.ECRTS.2024.5

Oliveira, Daniel; Chen, Weifan; Pinto, Sandro; Mancuso, Renato (July 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Microcontrollers (MCUs) are steadily embracing multi-core technology to meet growing performance demands. This trend marks a shift from their traditionally simple, deterministic designs to more complex and inherently less predictable architectures. While shared resource contention is well-studied in mid to high-end embedded systems, the emergence of multi-core architectures in MCUs introduces unique challenges and characteristics that existing research has not fully explored. In this paper, we conduct an in-depth investigation of both mainstream and next-generation MCU-based platforms, aiming to identify the sources of contention on systems typically lacking these problems. We empirically demonstrate substantial contention effects across different MCU architectures (i.e., from single- to multi-core configurations), highlighting significant application slowdowns. Notably, we observe that slowdowns can reach several orders of magnitude, with the most extreme cases showing up to a 3800x (times, not percent) increase in execution time. To address these issues, we propose and evaluate muTPArtc, a novel mechanism designed for Timely Progress Assessment (TPA) and TPA-based runtime control specifically tailored to MCUs. muTPArtc is an MCU-specialized TPA-based mechanism that leverages hardware facilities widely available in commercial off-the-shelf MCUs (i.e., hardware breakpoints and cycle counters) to successfully monitor applications' progress, detect, and mitigate timing violations. Our results demonstrate that muTPArtc effectively manages performance degradation due to interference, requiring only minimal modifications to the build pipeline and no changes to the source code of the target application, while incurring minor overheads.
more » « less
Full Text Available
Predictable GPU Sharing in Component-Based Real-Time Systems

https://doi.org/10.4230/LIPICS.ECRTS.2024.15

Ali, Syed W; Tong, Zelin; Goh, Joseph; Anderson, James H (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
This paper presents a real-time locking protocol whose design was motivated by the goal of enabling safe GPU sharing in time-sliced component-based systems. This locking protocol enables a GPU to be shared concurrently across, and utilized within, isolated components with predictable execution times. It relies on a novel resizing technique where GPU work is dimensioned on-the-fly to run on partitions of an NVIDIA GPU. This technique can be applied to any component that internally utilizes global CPU scheduling. The proposed locking protocol enables increased GPU parallelism and reduces GPU capacity loss with analytically provable benefits.
more » « less
Full Text Available
Open Problem Resolved: The "Two" in Existing Multiprocessor PI-Blocking Bounds Is Fundamental

https://doi.org/10.4230/LIPICS.ECRTS.2024.11

Ahmed, Shareef; Anderson, James H (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
The goal of a real-time locking protocol is to reduce any priority-inversion blocking (pi-blocking) a task may incur while waiting to access a shared resource. For mutual-exclusion sharing on an m-processor platform, the best existing lower bound on per-task pi-blocking under suspension-oblivious analysis is a (trivial) lower bound of (m-1) request lengths under any job-level fixed-priority (JLFP) scheduler. Surprisingly, most asymptotically optimal locking protocols achieve a per-task pi-blocking upper bound of (2m-1) request lengths under JLFP scheduling, even though a range of very different mechanisms are used in these protocols. This paper closes the gap between these existing lower and upper bounds by establishing a lower bound of (2m-2) request lengths under global fixed-priority (G-FP) and global earliest-deadline-first (G-EDF) scheduling. This paper also shows that worst-case per-task pi-blocking can be arbitrarily close to (2m-1) request lengths for locking protocols that satisfy a certain property that is met by most (if not all) existing locking protocols. These results imply that most known asymptotically optimal locking protocols are almost truly optimal (not just asymptotic) under G-FP and G-EDF scheduling.
more » « less
Full Text Available
Autonomy Today: Many Delay-Prone Black Boxes

https://doi.org/10.4230/LIPICS.ECRTS.2024.12

Liu, Sizhe; Wagle, Rohan; Anderson, James H; Yang, Ming; Zhang, Chi; Li, Yunhua (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Machine-learning (ML) technology has been a key enabler in the push towards realizing ever more sophisticated autonomous-driving features. In deploying such technology, the automotive industry has relied heavily on using "black-box" software and hardware components that were originally intended for non-safety-critical contexts, without a full understanding of their real-time capabilities. A prime example of such a component is CUDA, which is fundamental to the acceleration of ML algorithms using NVIDIA GPUs. In this paper, evidence is presented demonstrating that CUDA can cause unbounded task delays. Such delays are the result of CUDA’s usage of synchronization mechanisms in the POSIX thread (pthread) library, so the latter is implicated as a delay-prone component as well. Such synchronization delays are shown to be the source of a system failure that occurred in an actual autonomous vehicle system during testing at WeRide. Motivated by these findings, a broader experimental study is presented that demonstrates several real-time deficiencies in CUDA, the glibc pthread library, Linux, and the POSIX interface of the safety-certified QNX Operating System for Safety. Partial mitigations for these deficiencies are presented and further actions are proposed for real-time researchers and developers to integrate more complete mitigations.
more » « less
Full Text Available
GCAPS: GPU Context-Aware Preemptive Priority-Based Scheduling for Real-Time Tasks

https://doi.org/10.4230/LIPIcs.ECRTS.2024.14

Wang, Yidi; Liu, Cong; Wong, Daniel; Kim, Hyoseung (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
more » « less
Full Text Available
DeepTrust^RT: Confidential Deep Neural Inference Meets Real-Time!

https://doi.org/10.4230/LIPIcs.ECRTS.2024.13

Babar, Mohammad Fakhruddin; Hasan, Monowar (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Deep Neural Networks (DNNs) are becoming common in "learning-enabled" time-critical applications such as autonomous driving and robotics. One approach to protect DNN inference from adversarial actions and preserve model privacy/confidentiality is to execute them within trusted enclaves available in modern processors. However, running DNN inference inside limited-capacity enclaves while ensuring timing guarantees is challenging due to (a) large size of DNN workloads and (b) extra switching between "normal" and "trusted" execution modes. This paper introduces new time-aware scheduling schemes - DeepTrust^RT - to securely execute deep neural inferences for learning-enabled real-time systems. We first propose a variant of EDF (called DeepTrust^RT-LW) that slices each DNN layer and runs them sequentially in the enclave. However, due to extra context switch overheads of individual layer slices, we further introduce a novel layer fusion technique (named DeepTrust^RT-FUSION). Our proposed scheme provides hard real-time guarantees by fusing multiple layers of DNN workload from multiple tasks; thus allowing them to fit and run concurrently within the enclaves while maintaining real-time guarantees. We implemented and tested DeepTrust^RT ideas on the Raspberry Pi platform running OP-TEE+DarkNet-TZ DNN APIs and three DNN workloads (AlexNet-squeezed, Tiny Darknet, YOLOv3-tiny). Compared to the layer-wise partitioning approach (DeepTrust^RT-LW), DeepTrust^RT-FUSION can schedule up to 3x more tasksets and reduce context switches by up to 11.12x. We further demonstrate the efficacy of DeepTrust^RT using a flight controller (ArduPilot) case study and find that DeepTrust^RT-FUSION retains real-time guarantees where DeepTrust^RT-LW becomes unschedulable.
more » « less
Full Text Available

Search for: All records